Search results for "information extraction"

showing 10 items of 25 documents

Translingual text mining for identification of language pair phenomena

2016

Translingual Text Mining (TTM) is an innovative technology of natural language processing for building multilingual parallel corpora, processing machine translation, contextual knowledge acquisition, information extraction, query profiling, language modeling, contextual word sensing, creating feature test sets and for variety of other purposes. The Keynote Lecture will discuss opportunities and challenges of this computational technology. In particular, the focus will be made on identification of language pair phenomena and their applications to building holistic language model which is a novel tool for processing machine translation, supporting professional translations, evaluation of tran…

Machine translationLanguage identificationComputer sciencebusiness.industry05 social sciencessimilarity metrics02 engineering and technologycomputer.software_genre050105 experimental psychologycomputational linguisticsmultilingual information retrievalUniversal Networking LanguageCache language modelLanguage technology0202 electrical engineering electronic engineering information engineeringComputer-assisted translation020201 artificial intelligence & image processing0501 psychology and cognitive sciencesinformation extractionLanguage modelArtificial intelligencebusinesscomputerLanguage industryNatural language processing2016 Sixth International Conference on Innovative Computing Technology (INTECH)
researchProduct

The HisClima database: historical weather logs for automatic transcription and information extraction

2021

Knowing the weather and atmospheric conditions from the past can help weather researchers to generate models like the ones used to predict how weather conditions are likely to change as global temperatures continue to rise. Many historical weather records are available from the past registered on a systemic basis. Historical weather logs were registered in ships, when they were on the high seas, recording daily weather conditions such as: wind speed, temperature, coordinates, etc. These historical documents represent an important source of knowledge with valuable information to extract climatic information of several centuries ago. This paper presents a database for researching about the ca…

DatabaseComputer science05 social sciences050301 education02 engineering and technologyText recognitionAtmospheric modelcomputer.software_genreWind speedInformation extraction0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingTranscription (software)Baseline (configuration management)0503 educationRelevant informationcomputer2020 25th International Conference on Pattern Recognition (ICPR)
researchProduct

Human-in-the-Loop Conversation Agent for Customer Service

2020

This paper describes a prototype system for partial automation of customer service operations of a mobile telecommunications operator with a human-in-the loop conversational agent. The agent consists of an intent detection system for identifying the types of customer requests that it can handle appropriately, a slot filling information extraction system that integrates with the customer service database for a rule-based treatment of the common scenarios, and a template-based language generation system that builds response candidates that can be approved or amended by customer service operators. The main focus of this paper is on the system architecture and machine learning system structure …

Business requirementsbusiness.industryComputer sciencemedia_common.quotation_subject020206 networking & telecommunications02 engineering and technologycomputer.software_genreAutomationInformation extraction0202 electrical engineering electronic engineering information engineeringSystems architectureHuman-in-the-loop020201 artificial intelligence & image processingConversationMobile telephonyDialog systembusinessSoftware engineeringcomputermedia_common
researchProduct

BIOfid dataset: publishing a German gold standard for named entity recognition in historical biodiversity literature

2019

The Specialized Information Service Biodiversity Research (BIOfid) has been launched to mobilize valuable biological data from printed literature hidden in German libraries for over the past 250 years. In this project, we annotate German texts converted by OCR from historical scientific literature on the biodiversity of plants, birds, moths and butterflies. Our work enables the automatic extraction of biological information previously buried in the mass of papers and volumes. For this purpose, we generated training data for the tasks of Named Entity Recognition (NER) and Taxa Recognition (TR) in biological documents. We use this data to train a number of leading machine learning tools and c…

Biological dataService (systems architecture)Information retrievalbusiness.industryComputer science02 engineering and technologyScientific literature010501 environmental sciencescomputer.software_genre01 natural scienceslanguage.human_languageField (computer science)GermanInformation extractionNamed-entity recognitionPublishingddc:020ddc:5700202 electrical engineering electronic engineering information engineeringlanguage020201 artificial intelligence & image processingArtificial intelligencebusinesscomputer0105 earth and related environmental sciences
researchProduct

Diversity in random subspacing ensembles

2004

Ensembles of learnt models constitute one of the main current directions in machine learning and data mining. It was shown experimentally and theoretically that in order for an ensemble to be effective, it should consist of classifiers having diversity in their predictions. A number of ways are known to quantify diversity in ensembles, but little research has been done about their appropriateness. In this paper, we compare eight measures of the ensemble diversity with regard to their correlation with the accuracy improvement due to ensembles. We conduct experiments on 21 data sets from the UCI machine learning repository, comparing the correlations for random subspacing ensembles with diffe…

Computer sciencemedia_common.quotation_subjectAmbiguityEnsemble diversitycomputer.software_genreEnsemble learningData warehouseCorrelationInformation extractionKnowledge extractionStatisticsEntropy (information theory)Data miningcomputermedia_common
researchProduct

Extraction of Medical Terms for Word Sense Disambiguation within Multilingual Framework

2016

All the languages belonging to the same language family have a certain number of the common characteristics called language pair phenomena, which can be found quite useful for processing them for multilingual purposes like translation across the cognate languages, building dictionaries, thesauri, transcript collections, or for multilingual text retrieval of digital documents. In addition, it is estimated that more than 30% of English vocabulary has been inherited from Latin, which has dominated medical terminology in particular. We use this fact by exploring word sense disambiguation (WSD) in multilingual environment. Specifically in the medical domain, language pair phenomena can be limite…

Medical terminologybusiness.industryComputer sciencesimilarity metricsContext (language use)02 engineering and technologycomputer.software_genreSemEvalTerminologycomputational linguisticsmultilingual information retrievalword sense disambiguation020204 information systemsSimilarity (psychology)0202 electrical engineering electronic engineering information engineeringmedical informatics020201 artificial intelligence & image processingCognateArtificial intelligenceinformation extractionLanguage familybusinesscomputerNatural language processingWord (computer architecture)
researchProduct

FrameNet CNL: A Knowledge Representation and Information Extraction Language

2014

The paper presents a FrameNet-based information extraction and knowledge representation framework, called FrameNet-CNL. The framework is used on natural language documents and represents the extracted knowledge in a tailor-made Frame-ontology from which unambiguous FrameNet-CNL paraphrase text can be generated automatically in multiple languages. This approach brings together the fields of information extraction and CNL, because a source text can be considered belonging to FrameNet-CNL, if information extraction parser produces the correct knowledge representation as a result. We describe a state-of-the-art information extraction parser used by a national news agency and speculate that Fram…

Information retrievalParsingKnowledge representation and reasoningbusiness.industryComputer scienceAgency (philosophy)computer.software_genreParaphraseInformation extractionArtificial intelligenceSource textFrameNetbusinesscomputerNatural language processingNatural language
researchProduct

Embedded controlled language to facilitate information extraction from eGov policies

2015

The goal of this paper is to propose a system that can extract formal semantic knowledge representation from natural language eGov policies. We present an architecture that allows for extracting Controlled Natural Language (CNL) statements from heterogeneous natural language texts with the ability to support multilinguality. The approach is based on the concept of embedded CNLs.

Language identificationNatural language user interfacebusiness.industryComputer scienceNatural language programmingcomputer.software_genrelanguage.human_languageInformation extractionUniversal Networking LanguageControlled natural languageQuestion answeringlanguageArtificial intelligencebusinesscomputerNatural language processingNatural languageProceedings of the 17th International Conference on Information Integration and Web-based Applications & Services
researchProduct

Cueing animations: Dynamic signaling aids information extraction and comprehension

2013

The effectiveness of animations containing two novel forms of animation cueing that target relations between event units rather than individual entities was compared with that of animations containing conventional entity-based cueing or no cues. These relational event unit cues (progressive path and local coordinated cues) were specifically designed to support key learning processes posited by the Animation Processing Model (Lowe & Boucheix, 2008). Four groups of undergraduates (N ¼ 84) studied a usercontrollable animation of a piano mechanism and then were assessed for mental model quality (via a written comprehension test) and knowledge of the mechanism’s dynamics (via a novel non-verbal …

Computer scienceInstructional designEvent (computing)Eye movementAnimationcomputer.software_genreEducationComprehensionInformation extractionDynamics (music)Developmental and Educational PsychologyEye trackingcomputerCognitive psychologyLearning and Instruction
researchProduct

Extracting Semantic Knowledge from Unstructured Text Using Embedded Controlled Language

2016

Nowadays, most of the data on the Web is still in the form of unstructured text. Knowledge extraction from unstructured text is highly desirable but extremely challenging due to the inherent ambiguity of natural language. In this article, we present an architecture of an information extraction system based on the concept of Embedded Controlled Language that allows for extracting formal semantic knowledge from an unstructured text corpus. Moreover, the presented approach has a potential to support multilingual input and output.

Information retrievalConcept searchNoisy text analyticsbusiness.industryComputer scienceText simplification010401 analytical chemistryText graph02 engineering and technologycomputer.software_genre01 natural scienceslanguage.human_language0104 chemical sciencesInformation extractionControlled natural languageKnowledge extractionExplicit semantic analysis0202 electrical engineering electronic engineering information engineeringlanguage020201 artificial intelligence & image processingArtificial intelligencebusinesscomputerNatural language processing2016 IEEE Tenth International Conference on Semantic Computing (ICSC)
researchProduct